Calgary Property Assessments

House Prices: Advanced Regression Techniques

In this article, we use a dataset from Kaggle.com.

Competition Description

Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.

With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.

Data Description

File descriptions

Data fields

Here's a brief version of what you'll find in the data description file.

Feature Description Feature Description
SalePrice the property's sale price in dollars. HeatingQC Heating quality and condition
MSSubClass The building class CentralAir Central air conditioning
MSZoning The general zoning classification Electrical Electrical system
LotFrontage Linear feet of street connected to property 1stFlrSF First Floor square feet
LotArea Lot size in square feet 2ndFlrSF Second floor square feet
Street Type of road access LowQualFinSF Low quality finished square feet (all floors)
Alley Type of alley access GrLivArea Above grade (ground) living area square feet
LotShape General shape of property BsmtFullBath Basement full bathrooms
LandContour Flatness of the property BsmtHalfBath Basement half bathrooms
Utilities Type of utilities available FullBath Full bathrooms above grade
LotConfig Lot configuration HalfBath Half baths above grade
LandSlope Slope of property Bedroom Number of bedrooms above basement level
Neighborhood Physical locations within Ames city limits Kitchen Number of kitchens
Condition1 Proximity to main road or railroad KitchenQual Kitchen quality
Condition2 Proximity to main road or railroad (if a second is present) TotRmsAbvGrd Total rooms above grade (does not include bathrooms)
BldgType Type of dwelling Functional Home functionality rating
HouseStyle Style of dwelling Fireplaces Number of fireplaces
OverallQual Overall material and finish quality FireplaceQu Fireplace quality
OverallCond Overall condition rating GarageType Garage location
YearBuilt Original construction date GarageYrBlt Year garage was built
YearRemodAdd Remodel date GarageFinish Interior finish of the garage
RoofStyle Type of roof GarageCars Size of garage in car capacity
RoofMatl Roof material GarageArea Size of garage in square feet
Exterior1st Exterior covering on house GarageQual Garage quality
Exterior2nd Exterior covering on house (if more than one material) GarageCond Garage condition
MasVnrType Masonry veneer type PavedDrive Paved driveway
MasVnrArea Masonry veneer area in square feet WoodDeckSF Wood deck area in square feet
ExterQual Exterior material quality OpenPorchSF Open porch area in square feet
ExterCond Present condition of the material on the exterior EnclosedPorch Enclosed porch area in square feet
Foundation Type of foundation 3SsnPorch Three season porch area in square feet
BsmtQual Height of the basement ScreenPorch Screen porch area in square feet
BsmtCond General condition of the basement PoolArea Pool area in square feet
BsmtExposure Walkout or garden level basement walls PoolQC Pool quality
BsmtFinType1 Quality of basement finished area Fence Fence quality
BsmtFinSF1 Type 1 finished square feet MiscFeature Miscellaneous feature not covered in other categories
BsmtFinType2 Quality of second finished area (if present) MiscVal Value of miscellaneous feature
BsmtFinSF2 Type 2 finished square feet MoSold Month Sold
BsmtUnfSF Unfinished square feet of basement area YrSold Year Sold
TotalBsmtSF Total square feet of basement area SaleType Type of sale
Heating Type of heating SaleCondition Condition of sale

Distribution of Observations

Therefore,

Data Correlations

Let's take a look at the variance of the features.

High variance for some features can have a negative impact on our modeling process. For this reason, we would like to standardize features by removing the mean and scaling to unit variance. For this reason, we take a unique approach in preparing the data for end-to-end modeling. In doing so, we can separate numeric and categorical features and treat each group separately.

Modeling

Regressors

In this section, we test a number of efficient scikit-learn regressors. Then from those that have performed well, a stacked model can be formed. In particular, we use the following models:

Regressor Link
Bagging Regressorr sklearn.ensemble.BaggingRegressor
Decision Tree Regressor sklearn.tree.DecisionTreeRegressor
Gradient Boosting Regressor sklearn.ensemble.GradientBoostingRegressor
MLP Regressor sklearn.neural_network.MLPRegressor
Random Forest Regressor sklearn.ensemble.RandomForestRegressor

Moreover, the following table highlights the performance of the regressors.

Final Predictions


References

  1. Kaggle Dataset: House Prices - Advanced Regression Techniques